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Abstract 

The structured low-rank approximation problem for general affine structures, weighted 2-norm 
and fixed elements is considered. The variable projection priciple is used to reduce the dimen- 
sionality of the optimization problem. Algorithms for evaluation of the cost function, the gra- 
dient and approximation of the Hessian are developed. For m x n mosaic Hankel matrices the 
algorithms have complexity 0(m 2 n). 
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1. Introduction 

An affine matrix structure is an affine mapping from a structure parameter space W lp to a 
space of matrices R mxn : 

,y( P ) = s + J2pA, s k e R mx ". (y) 

Throughout the paper we assume that m < n. The structured low-rank approximation is the 
Droblem of finding the best low-rank structure-preserving approximation of a given data matrix 



problei 

Bi. 



Problem 1 (Structured low -rank approximation). Given an affine structure 5?, data vector pd G 
W lv , norm || • || and natural number r < m 

minimize \\pd ~ p\\ subject to rank.y(p)<r. (SLRA) 

In this paper, we consider the case of weighted 2-norm, given by 

\\p\\ 2 w := p T Wp, Wel^"', (H-lllr) 

where W is 
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• either a symmetric positive definite matrix, 

• or a diagonal matrix 

W = diag(u>i ,...,w n ), Wi e(0;+oo], 



(w -> W) 



where +00 • = by convention. A finiteness constraint ||j»d ~ p|lw < +°° * s a ddi- 
tionally imposed on (ISLRAb . Problem (ISLRAb with the weighted norm fl|j * iFv^P given by 



iw — s- Wl l is equivalent to structured low-rank approximation with elementwise weighted 
2-norm 

and a set of fixed values constraints 

(Pd)i = Pi for all i with Wi = +00. 

The structured low-rank approximation problem with weighted 2-norm appears in signal pro- 
cessing, computer algebra, identification of dynamical systems, and other applications. We refer 
the reader to (fTJ, 0] for an overview. In this paper, we consider general affine structures &,y\ and, 
in particular, structures that have the form 

,y( P ) = sjfe^GO, (i^j 

where $ is a full row rank matrix and ^n jn is a mosaic Hankel matrix structure yfl. Many 
applied problems can be reduced to (ISLRAb with the structure ( $J%° m , n 1 and weighted norm, 
defined by Aw ->■ WL see 10, [H. 



1.1. Mosaic Hankel structure 

A mosaic Hankel matrix structure ,^f m . n is a map defined by two integer vectors 

• n N ] G 



mi 



m q ] € N q 



as follows: 



and 

'^ ? m q ,n 1 (p^' "0 



n = \m 

^mi ,njy \y ) 



where 

p = co\(p^\...,p^\...,p^ N \.... 
is the partition of the parameter vector, and J$? m ,n '■ 

~Pl P2 P3 



pm+n— 1 



p( k , 1 ) g Jjm*+ni-l 



(m,n) 



is the Hankel structure 



P2 
P3 



P3 



_Prn Prn-\-l 



Pn 
Pn+1 



Pm+n-2 
Pvn+n — 2 Pm+n—\. 



Note that the number of parameters for ( ,^ m:n I is equal to 



N J2 



k=l 



N 

1=1 



Nq, 



N 



the number of columns is equal to ^ rrik, and the number of rows is 2~2 n i- The mosaic Hankel 



k=l 



1 = 1 



Ai J. | | L _L 

structure is a generalization of the block-Hankel structure |3J], which is defined as 

C 2 ••• 



Ci 

c 2 



Cl Cl+i 



C k 
Ck+i 

Cl+k- 



where Ck are q x N matrices. Indeed, consider permutation matrices 

$ := [It O ex ... Il® e q ] T , 
* := [I K ® ex ... (8 ejv] , 

where ® is the Kronecker product. Then the block-Hankel structure (J^L.if(C) 
formed to a mosaic Hankel structure iJ^ m 



can be trans- 



by permutation of the rows and columns 

K]{p), (^L,K^^) 



IK 



where the parameter vector p is defined as (jpj with p. 



(Ci)k,i- The parameter mapping 



corresponds to an unfolding of the 3-dimensional tensor defined by the matrices Ck- 

1.2. Optimization methods and the variable projection 

Problem ( ISLRAt is nonconvex and except for a few special cases (e.g. for unstructured ma- 
trices and Frobenius norm, for circulant matrices, and for some classes of square matrices, see 
10,01) has no closed-form solution. 

Different optimization methods have been developed for different structures and approxima- 
tion criteria, see yj] for a historical overview. Many optimization methods use the fact that the 



pe/xr 



satis- 



rank constraint rank D < r is equivalent to existence of a full row rank matrix R S 
fying RD = 0, where d :— m — r is the rank reduction in ( I SLR Al l. Problem ( ISLRAl i then can 
be rewritten (for weighted 2-norm) as 



minimize 



Ibd-Pl 



2 

W 



subject to ranki? = d and RS"(p)=0. (SLRA^) 



Methods for ( |SLRA/;| ) include Riemannian SVD structured total least norm approach 10, 0], 
and variable projection. Note that most of the optimization methods mentioned above were 
developed for the structured total least squares problem, which is the problem (|SLRAr]i with an 
additional contraint 

R=[X -I d ) , X G M. dxr . (STLS) 

Most of the methods are designed for 2-norm approximation criteria. 
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The variable projection approach was proposed in for separable non-linear least squares 
problems. Variable projection was first applied for some special cases of flSLRAfl) M\m. In 
the variable projection aproach, dSLRAj?| i is rewritten as 

minimize f(R), where (OUTER) 

ReR dxm , rank R=d 

/(ii) := f min bd-pllw subject to Ry{p) = o) . (f(R)) 



The inner minimization problem (/(i?) I is a linear least-norm problem |12jl and has a closed 



form solution (see also iQj). Therefore ( ISLRAt is reduced to optimization of the function 
on a space of dimension dm, which is typically much smaller than the dimension n p of the 
eliminated variable p. 

The cost function f(R) depends only on the subspace spanned by the rows of the argument R, 
i.e. f(Ri) = /(-R2) for rowspani?i = rowspani?2- Therefore, f(R) can be considered as 
a function defined on the Grassmann manifold [13] of all d-dimensional subspaces of 1™, and 
the problem (IQUTERb is optimization on a Grassmann manifold. The optimization problem on 
a Grassmann manifold can be either transformed to an optimization problem on an Euclidean 
space ( see B14 Sec. 2]), or can be solved by iterations in the tangent space (see, for example, 
linx[l3IO . Therefore, standard optimization routines can be used to minimize ( IQUTERt . 

The computation of the cost function has complexity 0(n 3 ) if the inner minimization prob- 



lem is solved by general-purpose methods (e.g. the QR decomposition 11511 '). For analytic 

computation of derivatives of f(R), which can speed up the convergence of local optimization 
methods, only algorithms with complexity 0(n 3 ) were proposed in the case of general affine 
structure 1 1611 . 

In Ml li 1 1 '711 it was shown that the variable projection approach leads to efficient (with com- 
plexity 0(mn)) local optimization methods for solution of dSLRAb (with dSTLSb restriction on 

R) with 2-norm for structures of type [C^ • • • C^] J , where each block is block- 
Toeplitz, block-Hankel or unstructured, and only whole blocks can be fixed. 

1.3. Main results and composition of the paper 

In this paper we show that the cost function f(R), its gradient and an approximation of 
its Hessian can be evaluated in 0(m 2 n) operations for structure {^,^f m . n \ and elementwise 



weighted 2-norm (that allows fixed values constraints). If the weights are block-wise (or only 
whole blocks are fixed), the cost function and the gradient can be evaluated in 0(mn) operations, 
as in (HI. 

We develop algorithms for evaluation of f(R) and its gradient for general affine structure 



and arbitrary weighted 2-norm. The structure of f(R) is derived in a similar way to [ 11|], but 
in contrast to 01 ll Il70 we do not use a probabilistic interpretation of ( 1 SLR Al l. Instead, we show 
how the matrix structure i5^\ is mapped to the structure of the cost function. In addition, our 
definition of weighted 2-norm incorporates fixed values constraints, which also simplifies the 
derivation of the cost function structure. 

We provide an explicit derivation of the approximation of the Hessian of f(R), for general 



affine structure and weighted 2-norm. This derivation was omitted in [ 17] and other papers, but 
was used in the computational routines in iflill . We provide two variants of the approximation of 
the Hessian, based on two different representations of f(R) as a sum of squares. 
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The paper is organized as follows. In Section [2] we review some basic properties of affine 
structures and weighted 2-norm. Section[3]covers the derivation of f(R) for general affine struc- 
ture and weighted 2-norm. For blocked structures the cost function is expressed via cost functions 
for the blocks. Based on results of Section[3] we develop general-purpose algorithms for evalu- 
ation of the cost function and its derivatives in Section[4] Finally, in Section[5]we specialize the 
algorithms for mosaic Hankel structures and derive their computational complexity. 

1.4. Notation 

Some of the notation used in the paper is summarized in Table Q] 

— vector of zeros 

1 — vector of ones 
e, — ith unit vector 

col(i>i ,vn) — concatenation of vectors v\ , . . . , vn 

blkdiag(Ai, . . . , An) — block-diagonal matrix composed of matrices At,..., An 

vec — column-wise vectorization of a matrix 

W _1 — either inverse of W if W is positive definite, 



or diag(7i, . . . 7„ p ), where j k 



W k 1 , Wk < +oo, 
w k = +oo, 



if W is given by ( Iw — >■ Wb 
La — a right Cholesky factor of a positive semi-definite matrix A 

(a lower-triangular matrix that satisfies L a La = A), 
La is unique if A is positive definite 



Table 1: Notation 



2. Affine structures and weighted 2-norms 



2.1. Basic properties 

Affine structures can be defined in an equivalent to t5^\ vector form 



vec y(p) — vec So + p, 



where 



[vec St 



vec S,, 



(vecJ?0 



(S^) 

"J" -i rn , n 



is the matrix representation of the linear part of t5^\ in the basis of elementary matrices {e.;e J 
Example 1. 2 x 3 Hankel matrices J%2.3(p) can be represented as ( iJ^l i with 

So = 0, St = 
In this case 



1 





0" 


, s 2 = 


"0 


1 


0" 


, ^3 = 


"0 





r 










0" 











1 











1 











1 



1 



where blank elements denote zeros. 

Using the representation ( Ivec y\ . we show that the Frobenius norm is a weighted 2-norm. 
Note 1. Let jy\i be an infective map (which corresponds to linearly independent \Sk\uL\ or 



( |S,j^| > with full column rank). Then 

\\y(p) ~ s>(p d )\\l = \\sAp-p d )\\l = Wv-vSn, 

where W := Sj,Sy is the Gramian of the system of vectors {Sk}^Li- 

Example 2. In Example\T\ the Gramian S J,S y is diagonal, and the weighted norm correspond- 
ing to the Frobenius norm is given by iw — > Wi l with w = col(l, 2, 2, 1). 

It is not difficult to show that the Frobenius norm in NoteQ]can be replaced by any weighted 
Frobenius norm (weighted 2-norm of vec (y(p) — y(j?d)))- 

2.2. From weighted 2-norm to 2-norm 

Any d SLR At problem with weighted 2-norm can be reduced to an unweighted ( ISLRAI ) prob- 
lem. Consider a change of parameters 

p = p d -£w-iAp. (Ap^_p) 

Then the transformed structure 

^A(Ap) := y{ PA ) - (y (L^-iAp) - So) , G^A) 

for any Ap satisfies 

y A (A P ) = y@). (y A ^y) 

Moreover, 

||f-pd||w = Ap T ^w-iWL^ v _ 1 Ap, 
and \\p — pdllw = II AplH is W is positive definite. In general, the following result holds. 
Proposition 1. Any problem ( ISLRAl i with weighted norm \\ ■ | w is equivalent to 

minimize IIAplH subject to ranky A (Ap) < r. (SLRA A ) 



Proof. The proof is given in Appendix A □ 



3. Variable projection 

3.1. Variable projection for weighted 2-norm 



In this subsection we derive an explicit expression for the cost function (/(i£) I for general 



affine structure and weighted 2-norm. Consider the change of variables ( |Ap — p\ . Then, by 
properties of the vectorization operator and \y A f> y\ , we have 

vec(i?J^(p)) = (I d ® R) vecy(p) = (I d ® i?) (vec^(p d ) - S^L^-tAp) . 
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The optimization problem in can be rewritten as 

minimize ||Ap|| 2 

Ap 

subject to G(J?)L^-iAp = v(R), 

where 

v{R) := (7„ ® R)vec,Y(p d ) = vec (A^(p d )) , 

G(JZ) = G^(i?) := (/„ ® i?)S^ = [vec (i?S*i) ■ • • v. 

Example 3. In Example\T\ for R = [n r 2 ] 

G(i?) = 



(LN) 



(K#)) 
(G^(i?)) 



ri r 2 



Problem (ILNt is a linear least-norm problem in a standard form [ 12, Ch. 6], and has a closed- 
form solution, which we summarize in the following proposition. 

Proposition 2. Let G(R) be of full row ranA0 Then the solution of flLNt exists and is given by 



f(R) = \\Ap*(R)g = v T (R)T-\R)v(R), 



(A P *(R)) 



where 



T{R) := 



ivor) ••• r 1>n (i?y 



r>nd x nd 



TijiR) := RVijR T , (T(i?)) 



_r n ,i(.R) ■•• r nj „(i?)_ 

where the matrices Vjj are d x d blocks of the matrix V 

[Via •■■ Vi, 



V 



V n> l • • • V„,7l 



and the matrix V is defined as 



V = V,^ W :=S, r W" 1 S T 



Proof. Note that by definition 

= (/„ ® R)V{I n ® i? T ) = G(R)W- 1 G T (R). 



Then ( Ap* (i?) 1 coincides with the solution of the least-norm prolem ( ILNb - 



(V) 

(GW- 1 G T ) 
□ 



'if G(R) is not of full row rank but the problem is feasible (the constraint in IQUTERt is consistent), the solution is 
given by f(R) = u T (-R)r(i?) 1 " u{R). 

7 



Example 4. In Example^ 



2 2 
2 2 



2 2 
2 2 



Examples[3]and|4]are generalized in Section l5TI 

The structure of T is determined by the structure of V. For example, if V is block-sparse, 
block-banded or block-Toeplitz, then T has the same structural property. 

Note 2. The representation of type ( |r(_R)| was already derived in jTTlU^/ from statistical con- 
siderations. Here we derive it through a series of linear algebraic transformations. 

3. 2. Variable projection for blocked matrices 

We consider basic transformations of the structures and derive the form of §f(R)\ for these 
transformed structures. First, we consider striped and layered block matrices. 

Lemma 1 (Striped structure). Let p — col(p^\ . . . ,p^), with prn 6 K n * and 

.y(p) = [^W( P W) ... yW(p(^))] , 

be the striped structure, ^ : R n *>° ->• W nxn ', andW = blkdiag(WW, . . ..W^). Then 
1. The cost function \f(R)\ for the striped structure is equal to the sum 

N 

f(R)=J2fl(R), (striped/) 
i=i 



where fi is the cost function \f(R)\ for the structure ,5^i and the weight matrix W"^. 
2. The correction Ap*(R) is the concatenation 

Ap*{R) =col(ApW*(i?),...,ApW*(i?)) (striped Ap*) 

of the corrections ( Ap* (R)\ for the structures S?^> and the weighted matrices W®. 
Proof. The inner minimzation problem ( IQUTERb can be expressed as 

N 

minimize ^ \\ Ap {l) \\\ subjectto R,^ (Ap {l) ) = 0, for I = 1, . . . , N. 

ApW £B n P 1=1 



This sum of squares is therefore minimized by ( striped Ap* i, and its norm is given by ( striped / 1 



□ 



Lemma 2 (Layered structure). Let p = col^ 1 ), . . . ,p^), with p^ € M. n p° and 

y( P ) = : 

be a layered structure, where : W 1 ^ -t R m fe xn . 



1. The G matrix is 

G{R) = [GV>{BP>) ■■■ G^(R^)], 

where R = • • • R(")] is the partition of R into flW € R dxm *. 

2. The r matrix is equal to the sum 

i 

t(r) = ^r (fc) (i? (fe) ) 

fc=l 

of the matrices (T(R) I corresponding to (ISLRAI l with and W^ k \ 

3. Let y( k ) be the matrix dVl) for the structure J^( fc ) and the weight matrix W^ fc ^. Then the 
matrix (TVT> for 5? and W is given by 

blkdiag(Vg Vg). 

4. The correction Ap* (R) can be expressed as 

Ap*(R) = col (ApW*(iJ), . . . , Ap( N >(R)), 

where Ap (fc) *(i?) := L (w(k)) -i(G^) T (T(R)y\(R). 

Proof. It is sufficient to prove statementQ] which follows from 

i i 
^c(R^ip))=Y,^(R {k) ^ {k) (P (k) )) = {G {k) (R {k) )p {k) +vec5^ ) J . 

k=l k=l 

The other statements follow from © and iGW~ 1 G T \ . □ 

Next we examine the effect of left multiplication of the structure by a full-row rank matrix. 
Lemma 3 (Multiplication by $). Let 

5»{p) = 

be an affine structure, where 3" : R n " — > R m xn and $ £ ]g>mxm - s a faU-row-rank matrix. 
Then, 

fy(R) = fMR®)- 
Proof. This property is easily verified by rewriting the constraint in ( j/(i?)[ ) as 

Ry(p) = R$(p) = 0, 

where i?$ is a full row rank matrix. □ 
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4. Cost function and derivatives evaluation 



In this section, we develop algorithm s for computing the cost function (f(R) I and its deriva- 
tives using the representation dt/~ l "r~ 1 i/t . for the general affine structure \y\ The algorithms can 
be specialized for a specific class of structures by deriving the form of as it will be done in 
Section|5]for the mosaic Hankel structure. 

In Section l4~Tl we provide algorithms for evaluation of ( |/(_R)| , computation of the optimal p 
in ( |/(_R)| , and computation of the gradient of 

In Sections [4. 2l and l4~3l we consider the following approximation of Hessian of \f(R)) . Let 
f(R) = \\F(R)\\2, where F(R) € M™ s with n s dm. Then JJJf is an approximation of the 
Hessian, where Jp is the Jacobian of F, This approximation is frequently used in methods of 
solution of nonlinear least-squares problems, e.g. in the Levenberg-Marquardt algorithm 11911 . In 



Section l4~2l we consider the Jacobian F(-) — Ap*(-), where Ap*(-) is defined in ( Ap*(R) I. In 
Section POl we consider F(-) — (Xp where Lr is the right Cholesky factor of T. 

We will frequently use the following notation 

y v := T-^iR), (y u ) 

for the solution of the system Ty v — v(R). 

4.1. Cost function, correction and gradient 

From (Jy^ji and the cost function \f{R)) can be represented as 

f(R) = yl(R)v(R) = v(R) T y„(R), 

and can be computed by AlgorithmQ] 

Algorithm 1 (Cost function evaluation). 

Input: y, W, Vij, p^ R. Output: f(R). 



1 . Compute ( 

2. Compute ( T(R)\ using Vij. 



3. Compute to^ll. 



4. Compute the cost function as iy„ v I. 



The term Ap* (R) in ( Ap* (R) i can be evaluated by Algorithm|2] 



Algorithm 2 (Correction computation). 

Input: y, W, V it j, p d , R. Output: Ap*(R). 

1. Perform steps 1-3 of Algorithm^ 

2. Multiply y v by L-w-iG T (R) on the left. 

The optimal p in \f{R)\ can be computed by combining Algorithm|2]and \ Ap — > p i. 

Algorithm 3 (Computation of p*). 

Input: y, W, Vi.j, pd, R. Output: p*(R). 

1. Perform steps 1—3 of Algorithm^ 

2. Multiply y v by W _1 G T (i?) on the left. 
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Instead of the gradient V/, for convenience, we use the matrix gradient Vj X ,„ / <E R dXT 
defined as 

vec(V dX m /) := V/. 



Proposition 3. Let y v = co\{yu \ • • • , yv) be the partition ofy v into yv £ M. d , and 



Then the matrix gradient is given by 



(i) («) 

vl ■■■ yl 



y u = vecY u . 



<>3=1 



'■J ■ 



Proof. The proof is given in Appendix A 



(V(ixm) 
□ 



From Proposition[3] the gradient can be computed by Algorithm[4] 

Algorithm 4 (Gradient evaluation). 

Input: y, W, Vjj, Pd, R- Output: V ' dxm f(R)- 

1. Perform steps 1—3 of Algorithm^ 



2. Compute the first term 1Y v Sf {pf) of the gradient ( V 



fix I 



3. Compute the second term of the gradient ( V^xm 



4.2. Approximation of the Hessian: Jacobian of Ap* 

The following proposition gives an expression to compute the Jacobian of ( Ap* (R) 

Proposition 4. 



• The Jacobian of ( Ap* (R) I is given by 
dAp 



dR 



8C T 

L w - 1 [G T (R)r- 1 z %3 + —y l/ 



where 



dv dY 



dRij (III,, 



y»- 



• The vector Zij can be expressed as 



sy = row 3 (y(p d )) ®e,- ([zf ] ) ® e t + zjf) 



where 



{dAp*/R i0 ) 



Oil) 



,(2) 



= col ( £ e]Vi, k R T y v k \ ■ ■ ■ , E eJV„, fc i? T y(' 



fe) 



\fc=l 



fc=l 



E(^ fc) ) l col (iZVi^e^ . . . , RV n , k ej) 
11 



fc=i 



Proof. The proof is given in Appendix A 



□ 



Proposition[4]leads to the following algorithm. 

Algorithm 5 (Jacobian evaluation). 

Input: 5f , W, "Vij, Pd, R- Output: Jacobian of Ap* (R). 

1. Compute steps 1—3 of Algorithm\l\ 

2. for j = 1, . . . , m do 
3. 



4. 
5. 
6. 
7. 



Compute 
far j = 1 



Compute ( z. 



,ddo 

and ( p7J) - 



Set x 



Set 



dAp* 



x + dR tj 

L w ~iX. 



9. end for 
10. end for 

4.3. Using Cholesky factorization 

In this section, we show that the cost function and an approximation of the Hessian can be 
computed using the Cholesky factorization of T 

r = LrpLr. 



The Cholesky factorization yields a numerically reliable way 11511 to solve the system of equa- 
tions Tu — v using the following identity 



Algorithm 6 (Solve system Tu = v). 
Input: T, v. Output: u. 



1. Compute the Cholesky factor Lr ofT. 

2. Solve L^F V = v and L^u = F u by backward substitution. 

Moreover, the cost function can be represented as 

f(R) = \\(Lj (R) )- 1 u(R)\\l 



1^ HID 



which leads to a more efficient algorithm for the cost function evaluation than AlgorithmQ] 

Algorithm 7 (Cost function evaluation using Cholesky factorization). 

Input: 5?, W, Vjj, pd, R- Output: f(R). 



1 . Compute ( 

2. Compute ( 



v(R) 



T(R) 



using Vi.j. 



3. Compute the Cholesky factor Lr- 

4. Solve LjF v {R) = v(R). 

5. Compute f(R) = \\F v (R)\\l 
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In addition, equation ( \\L~p v ||| I defines a representati on of the cost function as a sum of 
squares, which is more compact and easier to compute than (l^ T r~ 1 ^t . Indeed, Ap*(R) <E M. n p, 
but F U {R) £ M. nd . However, the elements of the Jacobian of F v 



dFu 
dRa 



dRa 



u(R), 



cannot be computed analytically due to the need of differentiation of (Lj) 1 . For this purpose 
the pseudo-Jacobian can be used 



dRa 



(-^r(fl)) 



.! du(R) 
dRa 



1 r)T 



{d^F v /dR l3 ) 



In ll2()ll it was shown that for functions of the form di/ T r~ 1 t4 . the pseudo-Jacobian yields the 
stationary points of f( R) and gives an approximation of the Hessian of /. 

It is easy to see that {d^'F v / dRij i and (Jz^JJ differ only by a constant factor in one summand. 
Therefore, 



d^F v 



r(R), 



-i Jp) 



e» + z 



(2) N 



(^) 



zf) :=row 3 -(^fo))® ei -i((*f>). 

The resulting algorithm is Algorithm[8] 

Algorithm 8 (Pseudo-Jacobian evaluation). 

Input: , W, Vi.j, Pd, R- Output: pseudo-Jacobian. 

1. Compute steps 1—2 of gradient evaluation using Algorithm® for solution of system of 
equation. 

2. for j = 1, . . . , m do 
3. 



Compute 
far j = 1 



Compute ( z. 



Multiply 



7. end for 

8. end for 



,ddo 

and 



,(2) 



» 



on the left by (L^/ R \) 1 , using the precomputed Cholesky factor. 



5. Weighted mosaic Hankel structured low-rank approximation problem 



In this section, we establish the form of T(R), Vij and f(R) for the structure ( ^J^ m ,n I and 
weight matrix 

W = blkdiag(W (1 ' 1) ,...,W feA ' ) ). (blkdiagW) 
In view of Lemmae[]J[3] we can consider only scalar Hankel structure. 



13 



5.1. Scalar Hankel matrices 
Let 



J rr 



1 0' 
: '•• '•■ 







be the right shift matrix for a row vector, i.e. 

[x\ • • • ^m] Jm — [0 X\ ' ' • X m _ij 

Lemma 4. For scalar Hankel structure ( |S ^) has the following form 

[Im 0}j° np 



I 1 ™ o]JT\ 



and the matrix G ,^> m has the structure 



Rl i?2 • • • Rm 



Rr, 



vndxn D 



Rl i?2 

where R = [R x ... R m ] eR mxd , with R l E R d . 
Proof. Indeed, for a vector p — [pi ... p np \ T , we have 

[Im 0] J* p p = [pu+1 ■ ■ ■ Pk+ m ] 

Therefore, ( Ivec holds with So = 0. 



By substituting ( S j%> m ~ i into ( G,y(R) i the equality ( G 



can be verified. 



{.Im/ 



(Sje> m .„) 



([I 0}J k ) 
□ 



We next consider the case of Hankel low -rank approximation with weighted 2-norm. 
Proposition 5. For Hankel structure and a weight matrix W the matrix (TVT > is equal to 

where (W _1 )™ Xm is a m x m submatrix o/W -1 , starting from element (i, j). 



Proof. From (0 and ( S j#> m ~ I, we have that 

v id = [i m o] j^w-^j^r 1 







From ( [/ 0] J i one can verify that ( V^j (o?C,n) • corresponds to selecting an m x m submatrix 
ofW"T □ 
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We will call a symmetric matrix s-banded ( s-block-banded) if all upper diagonals (resp. 
upper block diagonals) starting from s-th are zero. 

Corollary 1. 1. If W _1 is s-banded then V and Ty are (to + s — 1) -block-banded (with 
d x d blocks). 

2. 7fW _1 is Toeplitz then V and T are block-Toeplitz. 

3. For the case iw — > W"l) V and T are m-block-banded. In particular, 

(a) ForW — Id (ordinary 2-norm), 



{Jiy~\ forj>i, 
{J m y- j , farj<i. 



(b) For W = diag(u>i , . . . , w Up ) ( elementwise weights and fixed values), 



diag(7 i ,...,7, +m _i)(J^) : ' \ 



(v^) T , 



for j > i, 
for j < i, 



where 7,; :— w i . 



5.2. The main theorem 

By LemmaQ] we can consider only the layered Hankel structure 



where m = [mi 



Theorem 1. For structure \^ m . [n] \ and W = blkdiag(W (1) , . . . , W (?) ), with (W^)" 1 being 
-banded 

1. The matrix V is [i-block banded with d x d blocks, with /! := maxfcjm/j + £>W — 1}. 



2. IfCyftk) are all Toeplitz, then T is block-Toeplitz. 

3. For the case iw — » Wi and W^ k ' = diagw/ fe \ 10 € 
r jff m n (R) are block-banded with block bandwidth 

jj, := max{TO/}' =1 . 

In particular, 
(a) the matrices Vij for j > i can be expressed as 



the matrices V j#> m n and 



Vi d = diag(7 



(1) 



Tffin,-!, • • .,7i w , ■ • - ,7i?i n ,- 1 )(Ji) , - t , (V i , j (^? n>W )) 



„(?) 



„(?) 



w/iere = (w^ ) 1 ant/ J m := blkdiag(J mi , . . . , J m ); 
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(b) if the weights are constant for the blocks of ( I, i.e., 

w = co\ (w 1 l n m,...,w q l n i q) j , 
then Vje m „ is block-Toeplitz, i.e., (V^ m tl )jj = Vj-<, and 

V fe = diag(7i/ TOl , . . . , 7<?^m,)(^m) fc 7i : = ^f 1 - 

Proof. Theorem[T]follows from Lemma[2]and Corollary [l] □ 

5.3. Complexity of the algorithms for the mosaic Hankel structure 

In this section we use Theorem Q] and the fact that the Y matrix is -block-banded. We 



consider only the case of layered Hankel structure {^ mi \ n A and count only the number of mul- 
tiplications. The number of additions is typically less than the number of multiplications. 

Lemma 5. The complexity of the steps in the Algorithm\6\is given by 0(d 3 [i 2 n) for Step 1 and 
0{d 2 ixn) for Step 2. 

Proof. The Y £ ^_ ndxnd j s (^d)-banded, and the complexity of both steps is given in II 1511 . □ 



Theorem 2. The complexity of cost function evaluation using Algorithm\7\and gradient evalia- 
tion using Algorithm^is (d 3 fj,mn). 



Proof. The proof is given in | Appendix A □ 



Note 3. Due to Lemma\J\ for mosaic Hankel structure Jif miI1 with n = \n-y ■ ■ ■ tin] the 
complexity of cost function and gradient evaluation is 0(d 3 pmn). 

Note 4. The computation on the Step 2 in the Algorithm\2\has complexity 0(mnd). 

Theorem 3. Jacobian/pseudo-Jacobian have the computational complexity 0(d 3 m 2 n). 



Proof. The proof is given in Appendix A □ 



Next, we show that the complexity is linear in m for the cost function and the gradient in 
the case of block- wise weights (Theorem [TJ p. l3bl . In the cost function evaluation, the most 
expensive step is the Cholesky factorization, which can be performed in linear in m number 
of operations in this case l2lll . The computation of the gradient can be also simplified due to 
block-Toeplitz structure of Y. 

Proposition 6. Let y v = Y~ 1 s(R), and 

y v = vecY v , Y v :- 



(i) («) 
Vu' ■■■ Vu' 



be the partition of y v into a sequence of subvectors of length d. Under the conditions of Theo- 
rem^ p. \3b\ the gradient ( |Vrfxm| > can be simplified to 

V dxm (/) = 2Y v y T { Pd ) - 2^N RV + ( N k^k + N^RV k )) 

0<fc<min(n,/i) 

where := Y„( J^) k (Y v ) T , where J n is defined in ( | J m \ . 
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Proof. The proof is given in Appendix A □ 



Using Proposition^ the following theorem can be proved. 

Theorem 4. Under the conditions of Theorem\l\ p.\3b\the complexity of cost function and gra- 
dient evaluation is equal to 0(d 3 mn). 



Proof. The proof is given in | Appendix A □ 



Note 5. Consider mosaic Hankel matrices ( Jtf'm^ I and block-wise weights, which are constant 



along the block rows of the mosaic Hankel matrix, i.e. 

W = COI [Wxl (i,i), . . .,W q l (,,1), . . (!,»),. . («,»).) 

y IL p IL p '*>p ftp j 

Let the be the T matrix for the structure ,j%? m ^[n, ], as in Lemma\T\ Then all T^> are subma- 
trices ofT^ lmaa: \ where ni max > n; for all I. Therefore, only Chole sky factorization ofT^ lma ^ 
needs to be computed. 

5.4. Numerical examples 

First, we consider a family of 2 x 2 mosaic Hankel matrices 

^fmi m2] [m n 2] ,mi = 20, m 2 = 22, [ni na] = [250,255] + 250/c, k = {0, . . . , 10}. 

(EX1) 

In the case of 2-norm (w = 1) we compute the cost function and its derivatives using the SLRA 
package jfjj]. Hereafter, we use the term "element- wise variant" for the algorithms that treat 
weights as different values (implemented in the WLayeredHankelStructure C++ class). 
We use the term "block-wise variant" for the algorithms that utilize Theorem [4] (block-Toeplitz 
structure of V), and is implemented in the LayeredHankelStructure C++ class. In these 
examples,we consider only rank reduction by 1 (r = m— 1). 

Fig. Q] shows the time needed for computation of the cost function, gradient and pseudo- 
Jacobian for dEXU . with k = {0, . . . , 10}. The computation time is divided by n and plotted in 
logarithmic scale. In all cases the computation time is bounded by a linear function in n. This 
empirical observation agrees with Theorem[2]and Theorem[4] 

Next, we show the computation time for variying to, on examples of scalar Hankel matrix. 
In Fig. |2] and Fig.[3]the computational time is plotted for element-wise and block-wise variants, 
as explained above. From Fig.[3]one can see that the computation time for the cost function and 
gradient in the element-wise variant the computation time grows quadratically in to, which is in 
agreement with Theorem|2] 

For the block-wise variant, Fig.[2]shows the computational time for the cost function and the 
gradient is growing faster than linearly in m (in contrast to Theorem|4]i. The computation of the 
pseudo-Jacobian is also growing faster than quadratically in m (in contrast to Theorem[3j. This 
effect can be expected because in the current version of the software products by Vk,i matrices 
are implemented as products by precomputed matrices, and not as elementwise products. 
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Figure 1 : Logarithm of computation times divided by n for cost function, gradient and pseudo-Jacobian, depending on 
n. "el" denotes element-wise variant and "bl" denotes block-wise valiant. 



6. Conclusions 



In this paper we considered the structured low -rank approximation problem for general affine 
structure, weighted 2-norm and fixed values constraints. We used the variabl e projectio n princi- 
ple, which has many advantages when applied to ( I SLR Al l. The T matrix in is struc- 
tured and its structure is determined by the original matrix structure. For the mosaic Hankel 
structure the T matrix is block-banded/block-Toeplitz, depending on the structure of 



the weight matrix. This allows us to evaluate the cost function f(R) and its derivatives can be 
evaluated in 0{m 2 n) flops (0{mn) for the cost function and the gradient if T is block-Toeplitz). 
The approach can be applied for other matrix structures , where the structure of T (e.g. sparse- 
ness) can be exploited for efficient computations. 

Whenever possible, we considered the most general cases (structures, weights) and developed 
the algorithms for the general affine structure. We showed how the cost functions for the blocked 
structures (layered, striped) can be expressed through the cost functions of the blocks. This 
allowed us to reduce the mosaic Hankel case to the scalar Hankel case and helped to simplify the 
derivations of the algorithms and their complexities (compared to 111 l[ll7IO . We also considered 
the fixed values constraints as a part of the weighted norm, which also simplified the treatment 
of this case. 
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Figure 2: Computation time divided by m for the cost function, gradient and pseudo-Jacobian, for structure ,%in,2000 
and block-wise valiant. The computation time for the pseudo-Jacobian is divided by 20. 
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Appendix A. Proofs 



Proof of Proposition\I\ All admissible p for ( ISLRAI ) can be parametrized as ( |Ap — > p I, Indeed, 
if W _1 is positive definite, then L^-i is nonsingular and \Ap — > p| is an invertible affine trans- 
formation W lp — » W lp . If W is given by iw — s> Wl) . then L w -i = diag^y 77 /!, . . . , and 



the parametrization ( Ap — > p i keeps p/c = (pd)fc for fixed values k : Wk = +oo and runs over all 



possible pk for other fc. 

After substitution of ( |Ap — >■ p i into tJ^i and 



one obtains 



^(p)=«^a(Ap). 



For positive definite W we have 



\ a p\\1 = \\P-Pd\\w, 



and therefore ( I SLR Al l with weighted norm | • 1 1 w 
(Iw ->■ W> . then 

l|Ap||2 = ||P-fldllw + 



is equivalent to ( [SLRAa1 >. If W is given by 



{fc: w k =+oo} 



(A.l) 
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Figure 3: Computation time divided by m for the cost function, gradient and pseudo-Jacobian, for structure ,%in, 2000 
and element-wise variant. The computation time for the pseudo-Jacobian is divided by 50. 



and ( ISLRAb is equivalent to 



minimize CO} subject to rank^A(Ap) < 

Ap£R n i> 



(A.2) 



It is straightforward to see that S^a ( Ap) does not depend on the entries corresponding to fixed 
values. By checking the first-order optimality conditions one can see that all local minima of 
( IA.2I ) satisfy Apk = for Wk = +oo. The local minima of iA.2i therefore coincide with the 
local minima of ( |SLRAaI >- D 

Proof of Proposition]^ First, note that if the differential of / is represented as 

df(R,H)=tr(AH T ), 



then Vdxr n(f) = A. 

From ( ylv i the differential is given by 

df(R, H) = 2yJdv(R, H) - yJdT(R, H)y v . 



(df(R,H)) 



The first term in (df(R, H) i is equal to 

2yJdu(R, H) = tr (2Y u (Hy( Pd )) T ) = tr (2Y u y T ( Pd )H 1 
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The second term in ( df(R, H) i is equal to 

n 

yJdT(R,H)y„ = E (y^) T dT^R, H)y<p , 

n 

= J2 (yP) T (.BV id R T + RV i , j H T )y^ 



n 

E tr(( y W(j/«) T i?V^ i + #(^ ) ) T ^V i;i )F T ), 



which in combination with Vij — Vj i yields ( jVdxm i. □ 



Proof of Proposition^ The equation ( dAp* /Rij I can be obtained by differentiation of ( Ap* (R) 
The first summand in (Jz^Jj) can be expressed as 

dv , 



dRij 



vec e,e. 



y{p A )) = row j(y(p d ))®ei 



The second summand is 

dT 

-Vv = (oi, . . . ,a„), 



dR^ 



where 



a * = E i^ fc) = E(^ e Jv ; ,^ T + WterffrP 



dRij 

fe=i J fc=i 

n 

= E( e 7 v ^ flT ^ fe) ) e « + (y { v\RViMe r □ 
fe=i 

Proof of Theorem^ First, we provide the complexities for each individual step of Algorithm[7] 



1. Computation of ( v{R) I amounts to multiplication of a d x to matrix Rby m x n matrix 
,5^(p), which has computational complexity 0(dmn). 

2. We need to compute fin matrices T^j. By Theorem[T]each Vij has < to nonzero elements 
and Vij can be computed with 2d 2 m multiplications, therefore the total computational 
complexity is 2d 2 mfin. 

3. By Lemma|5]the complexity of this step is 0(d 3 fi 2 n). 

4. By Lemma|5]the complexity of this step is 0(d 2 fin). 

5. The number of multiplication needed for computing a norm of a vector is nd. 
Second, we provide the complexities for the steps of Algorithm|4] 

1. By the above derivations the complexity of this step 0(d 3 p 2 n). 

2. The multiplication of a d x n matrix by n x to matrix has complexity nmd. 

3. We need to compute 2 fin products y^\y\f l ) T RNij. By Theorem Q] the matrix Vj,j has 
only one nonzero diagonal and the product u = {yv ) T RVi,j can be computed in dm 
steps. The product y v u also takes dm steps. Therefore the total computational complex- 
ity of this step is 0(dfimn). □ 
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Proof of Theorem\3\ For each 
products of the form 



due to block-bandedness of V, we need to compute 2/in 
eJV llk R T yW=bl k ,yi k \ 



By Theorem[TJ eJVi y kR T = bj k ( , corresponds to elementwise multiplication of a column of 
R by elements from 74, which takes d multiplications. Another d m ultipl ications are needed to 



(k) 

compute the inner product of bj t k,i and y v . The computation of (2 
which lead to Q(d\ imn) number of multiplications 



(i) 



is repeated m times, 



For each ( z. 



,(2) 



, we need to compute 2fj,n products of the form (yi )iRVi,k&j, where each 



. (2)1 

product, as in the previous step, has also complexity 2d. The computation of ( z\- ' 1 is repeated 

md times, which leads to 0(d 2 [imn) complexity. 

For pseudo-Jacobian, we need to solve md times the banded system with the Cholesky factor 
and Zij, which has complexity 0(d 3 fimri). For Jacobian, we also need to multiply each 

r _1 Zjj from the left by G T R, which has complexity nmd by Note|4] Therefore, this step has 

additional complexity 0(d 2 m 2 n). □ 

Proof of Proposition® The second term in ( Vdxm 



can be represented as 



M 



E E vPW) t ™<. 

k=-M j-i=k 

In this case, Vij — Vj-j. It is easy to see that for j > i 



n — k 



E vP<vP) T - E ^ +fe) (^) T = yAJ:)\y») t = Nk 

j — i=k i=l 



□ 



Proof of Theorem^ Since only /1 matrices Tk need to be computed, step 2 of Algorithm[T]can 
be performed in 2d 2 mp flops. For Cholesky factorization (step 3) the algorithms exploiting the 
Toeplitz structure lulll can be used. Their complexity is 0(d 3 /in). 

Here we use Proposition^ Each Nk can be computed in d 2 n operations the computation of 
the product NkRVj has complexity 0(d 2 rn) due to the sizes of the involved matrices. Therefore 
the total computational complexity of Step 3 of Algorithm|4]if 0(d 2 iin). □ 
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